Overview
Proactive Problem Management activities are mostly ongoing activities targeted to improve the overall availability of
services and thereby obtain end user satisfaction. The outcomes from the above analysis generally trigger an
improvement initiative.
A few examples of proactive problem analysis include:
-
Pattern analysis of the incident records
-
Pattern analysis of maintenance records and operational logs
-
Periodic reviews of major incidents
-
Trend analysis of warning and exceptional events
-
Review of service and operational data for any quality issues, etc.
Proactive Problem Analysis can be done by employing preventive and perfective maintenance techniques. Few of these
techniques are listed below.
Preventive Maintenance Techniques For Proactive Problem Management
Preventive maintenance is the systematic inspection, detection, correction and prevention of any emerging problems
before they become actual problems. A few methods of preventive maintenance include:
Failure Mode and Effects Analysis (FMEA)
FMEA is a systematic, proactive method for evaluating processes to identify where and how it might fail, and to assess
the relative impact of different failures, in order to identify the parts of the process that are most in need of
change. The Risk Priority Number (RPN) is a numeric assessment of risk assigned to a process, or steps in a process, as
part of Failure Modes and Effects Analysis (FMEA), in which a team assigns each failure mode numeric values that
quantify likelihood of occurrence, likelihood of detection, and severity of impact. Main steps executed while
performing a FMEA includes:
Step 1: Identify potential failures and effects
Step 2: Determine severity
Step 3: Gauge the likelihood of occurrence
Step 4: Failure Detection
Step 5: Calculate the Risk Priority Number (RPN).
RPN = Severity * Occurrence * Detection
RPN should be calculated for the entire design and/or process and documented in the FMEA. Results should reveal the
most problematic areas, and the highest RPNs should get highest priority while implementing the actions.
Automated Health Check (AHC)
Automated Health Check provides the fastest and most accurate way to proactively detect and pinpoint the presence and
cause of problems that could impact the productivity of WLAN users, before those users report it. Implementing
automated health checks for respective technologies, which incorporate proactive monitoring probes into job logs or web
server logs to generate alerts, initiating advance remedial action before the occurrence of high severity issues. AHC
reduces the costs associated with user productivity loss and troubleshooting process caused by complex wireless
problems.
Pre-Processors
Implementing pre-processors on critical data feeds going into central systems is one of the proactive problem
management method. A proactive scan of these data feeds for data corruption, invalid references, and missing data
prevents a considerable number of high severity incidents.
Self Help
Implementing ‘self-help’ in the form of user guides and FAQ documents for online applications / functionality can
significantly reduce the number of user queries and RFI type of service requests.
Automated Archiving
Implementing automated deletion / archiving prevents the loss of application availability due to database/file space
congestion issues.
Perfective Maintenance Techniques For Proactive Problem Management
Perfective maintenance is the modification of a software product after delivery to improve its performance or
maintainability.
Cycle Time Reduction
Cycle time reduction is the strategy of lowering the time it takes to perform a process in order to improve
productivity. In addition, cycle time reduction often improves quality.
Reducing cycle time for critical batch jobs through:
-
Automation
-
Tuning
-
Multi-threading
-
Regular database maintenance
-
Performing multiple activities in parallel
-
Re-sequencing.
Application Renovation
Application Renovation involves a brainstorm by the delivery team of current issues and improvement ideas around the
following headings:
-
Automation
-
Availability
-
Maintenance
-
Performance Optimization
-
Scalability
-
Stability
-
Robustness
-
Volatility
-
Usability
-
Vulnerability.
Left Shift
Left shift is one of the core components of preventive maintenance strategy. It is a deliberate approach adopted with
intent to:
-
Reduce incident inflow for L2 and L3 support team
-
Reduce ticket backlog
-
Improve TAT
-
Higher customer satisfaction scores due to faster responses
-
Cost benefit for the client since the cost of first line support is lower than that of second line.
Some of the areas where Left Shift can be applied includes:
-
Tickets that are mainly resolved by Service Desk
-
Tickets which involve lot of manual efforts
-
Recurring tickets like password reset, user ID creation, etc.
-
Authorization and access management related tickets
-
Routine health checks & batch job failure tickets
-
Ad-hoc report generations
-
General admin and data change kind of tickets
-
Tickets which are resolved by educating the user.
Use of Memory Debugging Tools
A memory debugger also known as a runtime debugger is used for finding software memory problems such as memory leaks
and buffer overflows. These are due to bugs related to the allocation and deallocation of dynamic memory. These tools
help in periodically identifying and fixing memory leaks in transaction intensive systems using sophisticated memory
debugging tools like IBM Rational Purify.
Memory debuggers can help programmers to avoid software anomalies that would exhaust the computer system memory, thus
ensuring high reliability of the software even for long runtimes.
Finding memory issues such as leaks can be extremely time consuming. Using a tool to detect memory misuse makes the
process much faster and easier.
A few memory debugging tools in the market includes dmalloc (any OS), IBM Rational Purify (UNIX and Windows OS),
TotalView (Unix, Mac OS X), WinDbg (Windows OS), Daikon (Unix, Windows, Mac OS X), etc.
Removal of Bottlenecks
A bottleneck in a process occurs when input comes in faster than the next step can use it to create output. Identifying
and fixing bottlenecks is very important to reduce the problems related to customer dissatisfaction, resource waste,
high cost, high effort, insufficient resources, poor quality services, etc. RCA techniques shall be applied to avoid
bottlenecks.
A few pointers to reduce the bottleneck includes:
-
Reduce the strain on the bottleneck
-
Organize similar work items in batches
-
Add more people or resources to increase the capacity and speed up the work
-
Redesign the critical online and batch components.
Documenting Critical Business Processes
Critical Business Functions are business processes that must be restored in the event of a disruption to ensure the
ability to protect the organization’s assets and meet the needs of the organization as well as the business. These are
the processes which are vital to the business functions, vital to the operation of the company, processes that are in
direct contact with the customer and those which end up in great risk, if not handled properly. It is very important to
take care these processes in order to deliver the key products and services which enable an organization to meet its
objective. Preparing end-to-end process flows with detailed checklist for critical business processes helps in
improving Client satisfaction and reducing the escalations.
|